Too many things:
frames
back end for FITS table files (perhaps ocnvert ARF to CSV to get some
more experience with frames?)?Any of these require actual parsing of the header data...
So, back to the ARF file from the previous notebook.
I should note that up-to-date information on the FITS standard is available!
Unrelated ideas (although they do involve parsing, mainly XML responses from a web server):
given a name, call a name resolver (e.g. CADC) to get a location; then Chandra footprint server (which has images). Now, name server isn't actually needed.
explain @astroprop code
In [1]:
import qualified Data.ByteString.Char8 as B8
cts <- B8.readFile "../data/src.arf"
chunk :: Int
chunk = 2880
splitBS :: B8.ByteString -> Int -> Int -> B8.ByteString
splitBS inBS s n = B8.take n (B8.drop s inBS)
Cheat since we know how long the header is...
In [2]:
hdrBS = splitBS cts chunk (6*chunk)
In [3]:
B8.take 80 hdrBS
B8.drop (6*chunk-80) hdrBS
Re-use some of the code. However, let's tweak the types:
XXX probably drop this, as don't really take it anywhere and have other types that can be used to represent "type safety" (e.g. Keyword).
In [4]:
newtype Card = Card B8.ByteString
deriving (Eq, Show)
-- | Given a string, split off the next 80 characters.
getCard1 :: B8.ByteString -> (Card, B8.ByteString)
getCard1 bs = let (ls, rs) = B8.splitAt 80 bs
in (Card ls, rs)
With this, I can not mix up the unparsed data with a card, since they are a different
type (previously I used type which is just a synonym).
In [5]:
(a,_) = getCard1 hdrBS
a
In [6]:
getCard1 a
XXX hmmm; this doesn't really show off the type safety. may have to wait until have something more;
perhaps the parse-a-header unit. Of course, the Card stuff is likely a digression if the aim is to use a parser combinator library
a la attoparsec.
I can continue my "simplification mania":
In [7]:
import Control.Arrow (first)
In [8]:
:type first
In [9]:
:type first not (True,"False")
In [10]:
first not (True,"False")
In [11]:
-- | Given a string, split off the next 80 characters. The Arrow version.
getCard :: B8.ByteString -> (Card, B8.ByteString)
getCard = first Card . B8.splitAt 80
In [12]:
fst (getCard hdrBS)
XXX Insert a quite hilarious joke about fst and first.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
How about attoparsec => drop the card business?
In [13]:
import Data.Attoparsec.ByteString.Char8 hiding (take)
import Control.Applicative ((<*), (*>), (<|>), (<$>), optional)
Let's start with a keyword: 1 to 8 alphanumeric-ish characters:
In [14]:
-- simplify rules
keywordStart :: Parser Char
keywordStart = satisfy (inClass ['A' .. 'Z'])
In [15]:
parse keywordStart (B8.pack "X")
In [16]:
Done leftover parseval = parse keywordStart (B8.pack "XTENSION= 'blah blah'")
In [17]:
parseval
In [18]:
leftover
In [19]:
parseOnly keywordStart (B8.pack "Q")
In [20]:
parseOnly keywordStart (B8.pack "q")
In [21]:
parseOnly keywordStart (B8.pack "QQ")
In [22]:
parseOnly (keywordStart <* endOfInput) (B8.pack "QQ")
In [23]:
keywordBody1 :: Parser Char
keywordBody1 = satisfy (inClass (['A' .. 'Z'] ++ ['0' .. '9'] ++ "_-"))
In [24]:
keywordBody :: Parser Char
keywordBody = keywordStart <|> satisfy (inClass ['0'..'9']) <|> satisfy (\c -> c == '_' || c == '-')
In [25]:
parseOnly keywordBody (B8.pack "6")
In [26]:
keywordUnbounded :: Parser String
keywordUnbounded = do
fchar <- keywordStart
body <- many' keywordBody
return (fchar : body)
In [27]:
parseOnly keywordUnbounded (B8.pack "XTENSION= blah blah")
In [28]:
parseOnly keywordUnbounded (B8.pack "XTENSIONKEYWORD = blah blah")
In [29]:
-- Parse 0 up to n copies of the parser, returning the results.
--
-- It is expected that the counter is 0 or more.
upTo :: Int -> Parser a -> Parser [a]
upTo n p | n < 1 = return []
| otherwise = do
mx <- optional p
case mx of
Just x -> do
xs <- upTo (n-1) p
return (x : xs)
_ -> return []
In [30]:
:type optional
In [31]:
parseOnly (upTo 3 (char 'X')) (B8.pack "XX")
In [32]:
parseOnly (upTo 3 (char 'X')) (B8.pack "XXXX X")
In [33]:
parseOnly (upTo 3 (char 'X')) (B8.pack "YXXX")
In [34]:
-- Is keyword type useful here?
--
-- Ord is needed for the Map
newtype Keyword = Keyword String deriving (Eq, Ord, Show)
-- Note: this parses the first 8 characters (i.e. requires
-- spaces after "short" keywords).
--
keyword :: Parser Keyword
keyword = do
fchar <- keywordStart
body <- upTo 7 keywordBody
let res = fchar : body
nres = length res
count (8-nres) space -- ignore the return value
return (Keyword (fchar : body))
In [35]:
testBS = (B8.take 40 (B8.drop 80 hdrBS)) -- want to show the remainder, and a < 8 char keyword
Done remainder kword = parse keyword testBS
In [36]:
testBS
In [37]:
kword
In [38]:
remainder
What about values? Note: column lengths can be relaxed here (as there's a suggested fixed format but can also be relaxed):
'string' ... where '' is used to represent a
single quote character (there's also a minimum length of the string, since the trailing '
must be in column 20 or later, so trailing whitespace is ignored)T ... (or F) - a boolean, which is in column 30n ... - an integerf ... - a floating-point value, which may or may not be in exponential form, but if so
uses E.where the numeric values end in column 30 (I am ignoring complex numbers, and the ... part should be /...).
In [39]:
-- Should this store an Integer rather than Int?
--
data KeyValue = KeyString String | KeyFloat Double | KeyInt Int | KeyBool Bool deriving (Eq, Show)
In the following the skipSpace (before and handling of comments after) are left to the caller
In [40]:
-- | Remove trailing space characters
rstrip :: B8.ByteString -> B8.ByteString
rstrip = go
where
go bs = case B8.unsnoc bs of
Just (rbs,c) | c == ' ' -> go rbs
_ -> bs
In [41]:
stringVal :: Parser KeyValue
stringVal = do
let quote = '\''
char quote
body <- takeTill (== quote)
return (KeyString (B8.unpack (rstrip body)))
In [42]:
boolVal :: Parser KeyValue
boolVal =
let true = char 'T' *> return (KeyBool True)
false = char 'F' *> return (KeyBool False)
in true <|> false
In [43]:
parseOnly boolVal (B8.pack "T ")
parseOnly boolVal (B8.pack "F")
parseOnly boolVal (B8.pack "'T'")
Use
floatingOrInteger
from the
scientific
package -- Nopity nope nope since it takes 1.23E23 as an integer.
In [44]:
intVal1 :: Parser KeyValue
intVal1 = do
ans <- signed decimal
return (KeyInt ans)
intVal :: Parser KeyValue
intVal = KeyInt <$> signed decimal
In [45]:
parseOnly intVal (B8.pack "123")
parseOnly intVal (B8.pack "123.45")
parseOnly intVal (B8.pack "-123")
In [46]:
-- Support E+23 or E-23
exponential :: Parser Int
exponential = char 'E' *> signed decimal
-- this does not support 1E+23, it needs to be X.Y...
floatVal :: Parser KeyValue
floatVal = do
msign <- optional (char '-' <|> char '+')
ldigs <- many1' digit
char '.'
rdigs <- many1' digit
mexp <- optional exponential
let f = read (ldigs ++ '.' : rdigs)
g = case mexp of
Just eval | eval > 0 -> f * 10 ^ eval
| eval < 0 -> f / 10 ^ (abs eval)
| otherwise -> f
_ -> f
s = case msign of
Just '-' -> -1
_ -> 1
return (KeyFloat (s*g))
In [47]:
parseOnly floatVal (B8.pack "1.23E-43")
parseOnly floatVal (B8.pack "-0.0001E3")
parseOnly floatVal (B8.pack "123")
In [48]:
value :: Parser KeyValue
value = stringVal <|> boolVal <|> floatVal <|> intVal
In [49]:
map (parseOnly value . B8.pack) ["'A string ' ", "T", "123", "-1.23E+47"]
In [50]:
import Data.Maybe (fromMaybe)
-- Handle any description. I could also parse out any unit, but leave that for now.
comment :: Parser String
comment = do
skipSpace
mcbs <- optional (char '/' >> skipSpace >> many' anyChar)
-- TODO: strip trailing white space
return (fromMaybe "" mcbs)
In [51]:
-- Should the 80-character limitation be tracked by the parser, or can we split into chunks
-- beforehand?
card :: Parser (Keyword, KeyValue)
card = do
key <- keyword
-- have to have "= " but then need to also eat up any spaces
-- before the value.
char '=' >> many1' space
val <- value
_ <- comment
return (key, val)
In [52]:
splitBS hdrBS 0 80
splitBS hdrBS 80 80
parseOnly card (splitBS hdrBS 0 80)
parseOnly card (splitBS hdrBS 80 80)
In [53]:
import Data.List (unfoldr)
In [54]:
getCards :: B8.ByteString -> [B8.ByteString]
getCards = unfoldr go
where
go bs | B8.null bs = Nothing
| otherwise = Just (B8.splitAt 80 bs)
hdrCards = getCards hdrBS
In [55]:
parseOnly card (head hdrCards)
In [56]:
map (parseOnly card) (take 5 hdrCards)
In [57]:
mapM (parseOnly card) (take 5 hdrCards)
In [58]:
mapM (parseOnly card) hdrCards
In [59]:
import Data.Either (isLeft)
-- find the first card that fails (returns a Left value)
head (filter (isLeft . snd) (zip hdrCards (map (parseOnly card) hdrCards)))
So, good types can only take you so far in protecting yourself against errors in your data model.
In [60]:
import Control.Monad (void)
-- Skip blank lines, END lines, and those that are "metadata" lines (e.g. HISTORY/COMMENT)
fixedCard :: Parser (Maybe (Keyword, KeyValue))
fixedCard =
(do
key <- keyword
mchar <- optional (char '=')
case mchar of
Just _ -> do
many1' space
val <- value
void comment
return (Just (key, val))
_ -> many' anyChar >> return Nothing
)
<|>
(string (B8.pack "END") >> skipSpace >> return Nothing)
<|>
(skipSpace >> return Nothing)
In [61]:
head (filter (isLeft . snd) (zip hdrCards (map (parseOnly fixedCard) hdrCards)))
In [62]:
import qualified Data.Map as Map
import Data.Maybe (catMaybes)
Right mcs = mapM (parseOnly fixedCard) hdrCards
hdrMap = Map.fromList (catMaybes mcs)
In [63]:
Map.lookup (Keyword "OBJECT") hdrMap
In [64]:
-- import qualified Data.Sequence as Seq -- no longer using
In [65]:
-- Represent a binary table
--
-- Could the bitpix be a phantom argument ? Does this even make sense?
--
-- Assumes:
-- XTENSION=BINTABLE, NAXIS=2, PCOUNT=0, GCOUNT=1
--
-- Stores (explicitly or implicitly):
-- BITPIX, NAXIS1, NAXIS2, TFIELDS
-- TTYPE/TFORM/TUNIT<n> values for columns
--
data TableExt = TableExt {
_teBitPix :: Int
, _teNRows :: Int
, _teNCols :: Int
, _teWidth :: Int -- width, in bytes, of each row
, _teCols :: [ColData] -- would be nice to statically enforce size == _teNCols
} deriving Show
-- This stores the TTYPEn, TFORNm, and TUNITn keywords. Would it be
-- useful to encode the Haskell type for the data in the type?
--
-- At present the description/comment for the field is not included
data ColData = ColData {
_cdName :: String
, _cdFormat :: String -- the FITS encoding of the data type
, _cdUnit :: Maybe String
} deriving (Eq, Show)
In [66]:
import Control.Monad (forM, unless)
import Data.Either (either)
In [67]:
-- Note that a function is sent in, rather than the map.
createColumn :: (String -> Either String String) -> Int -> Either String ColData
createColumn getS c = do
let cs = show c
cName <- getS ("TTYPE" ++ cs)
cForm <- getS ("TFORM" ++ cs)
-- Need to swap from Either to Maybe; note that I do not want to
-- use cUnit <- getS ... otherwise the whole computation would
-- fail if no unit was given.
let cUnit = either (const Nothing) Just (getS ("TUNIT" ++ cs))
Right ColData { _cdName = cName, _cdFormat = cForm, _cdUnit = cUnit }
getKeyValue :: Map.Map Keyword KeyValue -> String -> Either String KeyValue
getKeyValue xs kn =
let err = "Key not found: " ++ kn
in maybe (Left err) Right (Map.lookup (Keyword kn) xs)
createTable :: Map.Map Keyword KeyValue -> Either String TableExt
createTable m = do
let get = getKeyValue m
asString _ (KeyString s) = Right s
asString k v = Left (k ++ " not a string: " ++ show v)
asInt _ (KeyInt i) = Right i
asInt k v = Left (k ++ " not an int: " ++ show v)
-- extract typed info
getS k = get k >>= asString k
getI k = get k >>= asInt k
ext <- getS "XTENSION"
naxis <- getI "NAXIS"
pcount <- getI "PCOUNT"
gcount <- getI "GCOUNT"
let conds = [ ext == "BINTABLE"
, naxis == 2
, pcount == 0
, gcount == 1 ]
unless (and conds) (Left ("invalid conditions: " ++ show conds))
bpix <- getI "BITPIX"
nbytes <- getI "NAXIS1"
nrows <- getI "NAXIS2"
ncols <- getI "TFIELDS"
-- Here cols is Either String [ColData], where the
-- string will represent the error from the fist
-- column which failed.
--
cols <- forM [1..ncols] $ createColumn getS
let lc = length cols
unless (lc == ncols) (Left ("Expected " ++ show ncols ++ " columns, found " ++ show lc))
Right TableExt {
_teBitPix = bpix
, _teNCols = ncols
, _teNRows = nrows
, _teWidth = nbytes
, _teCols = cols
}
In [68]:
createTable hdrMap
Now, the question is how to use this to parse the data: i.e. create a parser.
In [85]:
wrongMap = Map.insert (Keyword "TFIELDS") (KeyInt 4) hdrMap
In [86]:
Map.lookup (Keyword "TFIELDS") wrongMap
In [87]:
createTable wrongMap
The original map is unchanged:
In [88]:
Map.lookup (Keyword "TFIELDS") hdrMap
In [ ]: